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Rhizobium leguminosarum bv. trifolii WSM2012 (syn. MAR1468) is an aerobic, motile, 
Gram-negative, non-spore-forming rod that was isolated from an ineffective root nodule re- 
covered from the roots of the annual clover Trifolium rueppellianum Fresen growing in Ethio- 
pia. WSM2012 has a narrow, specialized host range for N 2 -fixation. Here we describe the 
features of R. leguminosarum bv. trifolii strain WSM2012, together with genome sequence in- 
formation and annotation. The 7,180,565 bp high-quality-draft genome is arranged into 6 
scaffolds of 68 contigs, contains 7,080 protein-coding genes and 86 RNA-only encoding 
genes, and is one of 20 rhizobial genomes sequenced as part of the DOE Joint Genome Insti- 
tute 2010 Community Sequencing Program. 



Introduction 

Atmospheric dinitrogen (N2) is fixed by special- 
ized soil bacteria (root nodule bacteria or rhizo- 
bia) that form non-obligatory symbiotic relation- 
ships with legumes. The complex, highly-evolved 
legume symbioses involve the formation of spe- 
cialized root structures (nodules) as a conse- 
quence of a tightly controlled mutual gene regu- 
lated infection process that results in substantial 
morphological changes in both the legume host 
root and infecting rhizobia [1]. When housed with- 
in root nodules, fully effective ISh-fixing bacteroids 
(the N2-fixing form of rhizobia) can provide 100% 
of the nitrogen (N) requirements of the legume 
host by symbiotic N 2 -fixation. 

Currently, N2-fixation provides ~40 million tonnes 
of nitrogen (N) annually to support global food 
production from ~300 million hectares of crop, 
forage and pasture legumes in symbioses with 



rhizobia [2]. The most widely cultivated of the 
pasture legumes is the legume genus Trifolium 
(clovers). This genus inhabits three distinct cen- 
ters of biodiversity with approximately 28% of 
species in the Americas, 57% in Eurasia and 15% 
in Sub-Saharan Africa [3]. A smaller subset of 
about 30 species, almost all of Eurasian origin, are 
widely grown as annual and perennial species in 
pasture systems in Mediterranean and temperate 
regions [3]. Globally important commonly culti- 
vated perennial species include T. repens (white 
cloverj, T. pratense (red clover], T. fragiferum 
(strawberry clover) and T. hybridum (alsike clo- 
ver). Trifolium rueppellianum is an important an- 
nual self-pollinating species grown in the central 
African continent as a food and forage legume. 

Clovers usually form N 2 -fixing symbiosis with the 
common soil bacterium Rhizobium leguminosarum 
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bv. trifolii, and different combinations of Trifolium 
spp. hosts and strains of R. leguminosarum bv. 
trifolii can vary markedly in symbiotic compatibil- 
ity [4] resulting in a broad range of symbiotic de- 
velopment outcomes ranging from ineffective 
(non-nitrogen fixing) nodulation to fully effective 
N2-fixing partnerships [5]. 

Rhizobium leguminosarum bv. trifolii strain 
WSM2012 (syn. MAR1468) has a narrow, special- 
ized host range for N2 fixation [6] and was isolated 
from a nodule recovered from the roots of the an- 
nual clover T. rueppellianum growing in Ethiopia 
in 1963. This strain is a good representative of one 
of the six centers of biodiversity, Africa, and can be 
used to investigate the evolution and biodiversity 
of R. leguminosarum bv. trifolii strains [6]. Here we 
present a preliminary description of the general 
features for R. leguminosarum bv. trifolii strain 
WSM2012 together with its genome sequence and 
annotation. 

Classification and general features 

R. leguminosarum bv. trifolii strain WSM2012 is a 
motile, Gram-negative rod (Figure 1 Left and Cen- 
ter) in the order Rhizobiales of the class 
Alphaproteobacteria. It is fast growing, forming 
colonies within 3-4 days when grown on half Lu- 
pin Agar (y 2 LA) [7] at 28°C. Colonies on %LA are 
white-opaque, slightly domed, moderately mucoid 
with smooth margins (Figure 1 Right). Minimum 
Information about the Genome Sequence (MIGS) is 
provided in Table 1. Figure 2 shows the phyloge- 
netic neighborhood of R. leguminosarum bv. trifolii 
strain WSM2012 in a 16S rRNA sequence based 
tree. This strain clusters closest to Rhizobium 
leguminosarum bv. trifolii T24 and Rhizobium 



leguminosarum bv. phaseoli RRE6 with 99.9% and 
99.8% sequence identity, respectively. 

Symbiotaxonomy 

R. leguminosarum bv. trifolii WSM2012 nodulates 
(Nod + ) and fixes N2 effectively (Fix + ) with both the 
African annual clover T. mattirolianum Chiov. and 
the African perennial clovers T. cryptopodium 
Steud. ex A. Rich and T. usamburense Taub [6]. 
WSM2012 is Nod + Fix- with the Mediterranean 
annual clover T. subterraneum L. and T. 
glanduliferum Boiss. and with both the African 
perennial clover T. africanum Ser. and the African 
annual clovers T. decorum Chiov. and T. steudneii 
Schweinf [1,26]. WSM2012 does not nodulate 
(Nod ) with the Mediterranean annual clover T. 
glanduliferum Prima nor the South American per- 
ennial clover T. pofymorphum Poir [6]. 

Genome sequencing and annotation 
information 

Genome project history 

This organism was selected for sequencing on the 
basis of its environmental and agricultural rele- 
vance to issues in global carbon cycling, alterna- 
tive energy production, and biogeochemical im- 
portance, and is part of the Community Sequenc- 
ing Program at the U.S. Department of Energy, 
Joint Genome Institute (JGI) for projects of rele- 
vance to agency missions. The genome project is 
deposited in the Genomes OnLine Database [25] 
and an improved-high-quality-draft genome se- 
quence in IMG. Sequencing, finishing and annota- 
tion were performed by the JGI. A summary of the 
project information is shown in Table 2. 
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Table 1. Classification and general features of Rhizobium leguminosarum bv. trifol 7/WSM2012 according to the 
MIGS recommendations [8] 



MIGS ID Property 



Term 



Evidence code 



MIGS-22 



MIGS-6 

MIGS-15 

MIGS-14 



MIGS-4 

MIGS-5 

MIGS-4.1 

MIGS-4.2 

MIGS-4.3 

MIGS-4.4 



Current classification 



Gram stain 

Cell shape 

Motility 

Sporulation 

Temperature range 

Optimum temperature 

Salinity 

Oxygen requirement 
Carbon source 
Energy source 
Habitat 

Biotic relationship 
Pathogenicity 
Biosafety level 
Isolation 

Geographic location 

Nodule collection date 

Longitude 

Latitude 

Depth 

Altitude 



Domain Bacteria 
Phylum Proteobacteria 
Class Alphaproteobacteria 
Order Rhizobiales 
Family Rhizobiaceae 
Genus Rhizobium 

Species Rhizobium leguminosarum bv.trifolii 

Negative 

Rod 

Motile 

Non-sporulating 
Mesophile 

28°C 

Non-halophile 

Aerobic 

Varied 

C he m oorg a not rop h 
Soil, root nodule, on host 
Free living, symbiotic 
Non-pathogenic 
1 

Root nodule 
Ethiopia 
April 1963 
40.209961 
9.215982 
Not recorded 
Not recorded 



TAS [9] 

TAS [10] 

TAS [11,12] 

TAS [12,13] 

TAS [14,15] 

TAS [14,1 6-19] 

TAS [14,1 6,19,20] 

IDA 

IDA 

IDA 

NAS 

NAS 

NAS 

NAS 

NAS 

IDA 

NAS 

IDA 

IDA 

NAS 

TAS [21] 

IDA 

IDA 

IDA 

IDA 

IDA 



Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists 
in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sam- 
ple, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes 
are from the Gene Ontology project [22]. 
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Rhizobium leguminosarum bv trifolii 1 WSM2012 (Gi06480) 
Rhizobium leguminosarumbv. trifolii T24 (U31074) 
— Rhizobium leguminosarum bv. phaseoli RRE6 (AY94601 2) 
■ Rhizobium miluonense CCBAU 41251 T (EF061096) 



100 
GO 



Rhizobium multihospitium CCBAU 83401 T (EF035074) 
Rhizobium frop/c/CIAT899 T (EU488752, Gi05744) 
R/iizob/umphaseo//ATCC14482 T (EF141340) 
Rhizobium fabae IMG 23997 T (DQ835306) 

Rhizobium etli CFN 42 T (U28916, Gc00342)* 

Rhizobium pisi DSM 301 32 T (AY509899) 

- Rhizobium tibeticum CCBAU 85039 T (EU256404) 



98 ^Rhizobium alamii LMG 24466 T (AM931436) 



Rhizobium mesosinicum CCBAU 25010 T (DQ100063) 

- Rhizobium indigoferae CCB/KU 71042 T (AF364068) 

■ Rhizobium mongo/enseUSDA1844 T (U89817, Gi08900) 

— Rhizobium yanglingense SH22623 1 (AF003375) 

■ Rhizobium tubonense CCBAU 85046 1 (EU256434) 

— Rhizobium soli DS-42 T (EF36371 5) 

- Rhizobium loessense CCBAU 71 90B T (AF364069) 



j- Rhizobium galegae gal 1261 T (X67226, Gi09589) 

99~l Rhizobium vignae CCBAU05176 T (GU128881 ) 
• Rhizobium oryzae LMG 24253 T (EU056823) 

■ Rhizobium fa*a/s/)anense CCNWSX 0483 T (HM776997) 
Rhizobium Wf/sATCC 49767 T (X67225, Gi15372) 
■ Rhizobium larrymoorei LMG 21410 T (NR_026519) 



HZ 



Rhizobium radiobacter ATCC 19358 T (AJ389904) 

Rhizobium selenitireducens LMG 24075 T (EF440185) 

g7 I Ensifer kummerowiae CCBAU 71 714 T (AF364067) 



Ensifer meliloti LMG 61 33' (X67222) 



- Ensifermedicae LMG 19920 T (L39882) 
EnsiferfrediiLMG 6217 T (X67231) 
75l Ensifer xinjiangensis LMG 1 7930 T (D1 2796) 
Ensiferterangae LMG 7834 T (X68388) 



74 r Mesorhizobium gobiense LMG 23949 (EF035064) 

Mesorhizobium loti USDA 3471 T (X67229, Gi08881 ) 



t— Mesorhizobium septentrionale SDW 014 T (AF508207) 
Mesorhizobium plurifarium LMG 11892 T (Y14158) 



66 r- Mesorhizobium huakuit LMG 14107 T (D13431) 

50l — Mesorhizobium opportunistum WSM2075 T (CP002279, Gc01853)* 
Bradyrhizobium elkanii USDA 76 T (AF362942, Gi08850) 

I — Bradyrhizobium canariense LMG 22265 T (AJ558025) 
Bradyrhizobium liaoningense LMG 1 8230 T (AJ25081 3) 
Bradyrhizobium yuanmingense LMG 21827 T (AF193818) 
■ Azoi hizobium caulinodansORS 571 T (X67221, Gc00669)* 



Figure 2. Phylogenetic tree showing the relationship of Rhizobium leguminosarum bv. trifolii 
WSM2012 (shown in blue print) with some of the root nodule bacteria in the order Rhizobiales 
based on aligned sequences of the 16S rRNA gene (1,306 bp internal region). All sites were in- 
formative and there were no gap-containing sites. Phylogenetic analyses were performed using 
MEGA, version 5.05 [2 3]. The tree was built using the maximum likelihood method with the Gen- 
eral Time Reversible model. Bootstrap analysis [24] with 500 replicates was performed to assess 
the support of the clusters. Type strains are indicated with a superscript T. Strains with a genome 
sequencing project registered in GOLD [25] are in bold print and the GOLD ID is mentioned after 
the accession number. Published genomes are indicated with an asterisk. 
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Table 2. Genome sequencing project information for Rhizobium leguminosarum bv. trifolii strain WS M2012. 
MIGS ID Property Term 



MIGS-31 

MIGS-28 

MIGS-29 

MIGS-31 .2 

MIGS-30 

MIGS-32 



Finishing quality 
Libraries used 
Sequencing platforms 
Sequencing coverage 
Assemblers 
Gene calling methods 
GOLD ID 
NCBI project ID 
Database: IMG 
Project relevance 



Improved high-quality draft 

lllumina GAii shotgun and paired end 454 libraries 

lllumina, 454 GS FLX Titanium technologies 

7.4x 454 paired end, 300x lllumina 

Velvet 1.013, Newbler2.3, phrap 4.24 

Prodigal 1.4, GenePRIMP 

Gi06480 

65301 

25092 76033 

Symbiotic N2 fixation, agriculture 



Growth conditions and DNA isolation 

Rhizobium leguminosarum bv. trifolii strain 
WSM2012 was grown to mid logarithmic phase in 
TY rich medium [27] on a gyratory shaker at 28°C. 
DNA was isolated from 60 ml of cells using a CTAB 
(Cetyl trimethyl ammonium bromide) bacterial 
genomic DNA isolation method [28]. 

Genome sequencing and assembly 

The genome of Rhizobium leguminosarum bv. 
trifolii strain WSM2012 was sequenced at the Joint 
Genome Institute (JGI) using a combination of 
lllumina [29] and 454 technologies [30]. An 
lllumina GAii shotgun library which produced 
63,969,346 reads totaling 4,861.7 Mb, and a 
paired end 454 library with an average insert size 
of 8 Kb which produced 428,541 reads totaling 
92.6 Mb of 454 data were generated for this ge- 
nome. All general aspects of library construction 
and sequencing performed at the JGI can be found 
at the JGI user homepage [28]. The initial draft 
assembly contained 158 contigs in 6 scaffolds. The 
454 paired end data was assembled with Newbler, 
version 2.3. The Newbler consensus sequences 
were computationally shredded into 2 Kb over- 
lapping fake reads (shreds). lllumina sequencing 
data were assembled with Velvet, version 1.0.13 
[31], and the consensus sequences were computa- 
tionally shredded into 1.5 Kb overlapping fake 
reads (shreds). The 454 Newbler consensus 
shreds, the lllumina VELVET consensus shreds 
and the read pairs in the 454 paired end library 
were integrated using parallel phrap, version SPS - 
4.24 (High Performance Software, LLC). The soft- 
ware Consed [32-34] was used in the following 
finishing process. lllumina data were used to cor- 



rect potential base errors and increase consensus 
quality using the software Polisher developed at 
JGI (Alia Lapidus, unpublished). Possible mis- 
assemblies were corrected using gapResolution 
(Cliff Han, unpublished), Dupfinisher [35], or se- 
quencing cloned bridging PCR fragments with 
subcloning. Gaps between contigs were closed by 
editing in Consed, by PCR and by Bubble PCR (J-F 
Cheng unpublished) primer walks. A total of 167 
additional reactions were necessary to close gaps 
and to raise the quality of the finished sequence. 
The estimated genome size is 6.7 Mb and the final 
assembly is based on 49.8 Mb of 454 draft data 
which provides an average 7.4* coverage of the 
genome and 2,010 Mb of lllumina draft data which 
provides an average 300* coverage of the genome. 

Genome annotation 

Genes were identified using Prodigal [36] as part 
of the DOE-JGI Annotation pipeline [37], followed 
by a round of manual curation using the JGI 
GenePRIMP pipeline [38]. The predicted CDSs 
were translated and used to search the National 
Center for Biotechnology Information (NCBI) non- 
redundant database, UniProt, TIGRFam, Pfam, 
PRIAM, KEGG, COG, and InterPro databases. These 
data sources were combined to assert a product 
description for each predicted protein. Non- 
coding genes and miscellaneous features were 
predicted using tRNAscan-SE [39], RNAMMer [40], 
Rfam [41], TMHMM [42], and SignalP [43]. Addi- 
tional gene prediction analyses and functional an- 
notation were performed within the Integrated 
Microbial Genomes (IMG-ER) platform [44]. 
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Genome properties 

The genome is 7,180,565 nucleotides with 60.89% 
GC content (Table 3) and comprised of 6 scaffolds 
(Figure 3) of 68 contigs. From a total of 7,166 
genes, 7,080 were protein encoding and 86 RNA 



only encoding genes. The majority of genes 
(72.87%) were assigned a putative function while 
the remaining genes were annotated as hypothet- 
ical. The distribution of genes into COGs functional 
categories is presented in Table 4. 



Table 3. Genome statistics for Rhizobium leguminosarum bv. trifolii WSM201 2 
Attribute Value % of Total 



Genome size (bp) 7,180,565 100.00 

DNA coding region (bp) 6,196,449 86.29 

DNA G+C content (bp) 4,372,528 60.89 

Number of scaffolds 6 

Number of contigs 68 

Total gene 7,166 100.00 

RNA genes 86 1.20 

rRNA operons* 3 

Protein-coding genes 7,080 98.80 

Genes with function prediction 5,222 72.87 

Genes assigned to COGs 5,682 79.29 

Genes assigned Pfam domains 5,892 82.22 

Genes with signal peptides 615 8.58 

Genes with transmembrane helices 1,617 22.56 

CRISPR repeats 0 

*1 extra 5s rRNA gene 
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Figure 3. Graphical map of the genome of Rhizobium leguminosarum bv. trifolii strain WSM2012. From bot- 
tom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by the IMG 
platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other 
RNAs black), GC content, GC skew. 
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Table 4. Number of protein coding genes of Rhizobium leguminosarum bv. trifolii 
WSM2012 associated with the general COG functional categories. 



Code 


Value 


%age 


COG Category 


J 


206 


3.25 


Translation, ribosomal structure and biogenesis 


A 


0 


0.00 


RNA processing and modification 


K 


619 


9.76 


Transcription 


L 


237 


3.74 


Replication, recombination and repair 


B 


2 


0.03 


Chromatin structure and dynamics 


D 


48 


0.76 


Cell cycle control, mitosis and meiosis 


Y 


0 


0.00 


Nuclear structure 


V 


77 


1.21 


Defense mechanisms 


T 


330 


5.20 


Signal transduction mechanisms 


M 


335 


5.28 


Cell wall/membrane biogenesis 


N 


85 


1.34 


Cell motility 


Z 


1 


0.02 


Cytoskeleton 


W 


0 


0.00 


Extracellular structures 


u 


108 


1.70 


Intracellular trafficking, secretion and vesicular transport 


o 


187 


2.95 


Posttranslational modification, protein turnover, chaperones 


c 


32 7 


5.16 


Energy production conversion 


G 


636 


10.03 


Carbohydrate transport and metabolism 


E 


716 


11.29 


Amino acid transport metabolism 


F 


107 


1.69 


Nucleotide transport and metabolism 


H 


215 


3.39 


Coenzyme transport and metabolism 


I 


214 


3.37 


Lipid transport and metabolism 


P 


311 


4.90 


Inorganic ion transport and metabolism 


Q 


154 


2.43 


Secondary metabolite biosynthesis, transport and catabolism 


R 


802 


12.65 


General function prediction only 


S 


625 


9.85 


Function unknown 




1,484 


20.71 


Not in COGS 
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